Re: Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

2017-10-29 Thread mmdenny
Hi Supun, Did you look at https://spark.apache.org/docs/latest/tuning.html? In addition to the info there, if you're partitioning by some key where you've got a lot of data skew, one of the task's memory requirements may be larger than the RAM of a given executor, where the rest of the tasks ma

Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

2017-10-27 Thread Supun Nakandala
pipeline is iterative. I get OOM errors and GC overhead limit exceeded errors and I fix them by increasing the heap size or number of partitions even though after doing that there is still high GC pressure. I know that my partitions should be small enough such that it can fit in memory. But when

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
1.5G, it means almost nothing left for other > >> things. > > True. I have also tried with memoryOverhead being set to 800 (10% of the > 8Gb > > memory), but no difference. The "GC overhead limit exceeded" is still the > > same. > > > >> Pyth

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Davies Liu
le could take 1.5G, it means almost nothing left for other >> things. > True. I have also tried with memoryOverhead being set to 800 (10% of the 8Gb > memory), but no difference. The "GC overhead limit exceeded" is still the > same. > >> Python UDF do requires some b

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
let's say 800 million records, then all works well. But > when I > > run the same SQL on all the data, then I receive > > "java.lang.OutOfMemoryError: GC overhead limit exceeded" from basically > all > > of the executors. > > > > It seems to me

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Davies Liu
on which we would do some calculation using > UDFs in pyspark. > If I run my SQL on only a portion of the data (filtering by one of the > attributes), let's say 800 million records, then all works well. But when I > run the same SQL on all the data, then I receive > "jav

java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Zoltan Fedor
nly a portion of the data (filtering by one of the attributes), let's say 800 million records, then all works well. But when I run the same SQL on all the data, then I receive "*java.lang.OutOfMemoryError: GC overhead limit exceeded"* from basically all of the executors. It seems to me

Re: GC overhead limit exceeded

2016-05-16 Thread Takeshi Yamamuro
gt;> GB), desiredKV=7.19 GB OOM! >>> 05-16 15:14:10.215 127.0.0.1:54321 2059 #e Thread WARN: >>> Swapping! >>> GC CALLBACK, (K/V:29.74 GB + POJO:16.90 GB + FREE:10.86 GB == >>> MEM_MAX:57.50 >>> GB), desiredKV=7.19 GB OOM! >>> 05-16 15:14:3

Re: GC overhead limit exceeded

2016-05-16 Thread Aleksandr Modestov
f partitions. > > // maropu > > On Mon, May 16, 2016 at 10:00 PM, AlexModestov < > aleksandrmodes...@gmail.com> wrote: > >> I get the error in the apache spark... >> >> "spark.driver.memory 60g >> spark.python.worker.memory 60g >> spark.master

Re: GC overhead limit exceeded

2016-05-16 Thread Takeshi Yamamuro
r in the apache spark... > > "spark.driver.memory 60g > spark.python.worker.memory 60g > spark.master local[*]" > > The amount of data is about 5Gb, but spark says that "GC overhead limit > exceeded". I guess that my conf-file gives enought resources. >

GC overhead limit exceeded

2016-05-16 Thread AlexModestov
I get the error in the apache spark... "spark.driver.memory 60g spark.python.worker.memory 60g spark.master local[*]" The amount of data is about 5Gb, but spark says that "GC overhead limit exceeded". I guess that my conf-file gives enought resources. "16/05/16 15:13:02

_metada file throwing an "GC overhead limit exceeded" after a write

2016-02-12 Thread Maurin Lenglart
ailed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "qtp1919278883-98" java.lang.OutOfM

Re: Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)

2016-01-12 Thread Muthu Jayakumar
t;500"); > > I'm trying to load data from hdfs and running some sqls on it (mostly > groupby) using DataFrames. The logs keep saying that tasks are lost due to > OutOfMemoryError (GC overhead limit exceeded). > > Can you advice what is the recommended settings (memory, cores, > partitions, etc.) for the given hardware? > > Thanks! >

Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)

2016-01-12 Thread Barak Yaish
strationRequired","true"); sparkConf.set("spark.kryoserializer.buffer.max.mb","512"); sparkConf.set("spark.default.parallelism","300"); sparkConf.set("spark.rpc.askTimeout","500"); I'm trying to load data from hdfs and running some sql

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Adrien Mogenet
Spark tasks >> are done. >> >> After the spark tasks are done, the job appears to be running for over an >> hour, until I get the following (full stack trace below): >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> at >>

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Cheng Lian
dataset successfully. I can see the output in HDFS once all Spark tasks are done. After the spark tasks are done, the job appears to be running for over an hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC overhead

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Jerry Lam
; successfully. I can see the output in HDFS once all Spark tasks are done. > > After the spark tasks are done, the job appears to be running for over an > hour, until I get the following (full stack trace below): > >

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Don Drake
ull stack trace below): > > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238) > > I had set the driver memory to be 20GB. > > I attempted to read in

df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-28 Thread Don Drake
hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238) I had set the driver memory to be 20GB. I attempted to

newbie simple app, small data set: Py4JJavaError java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-18 Thread Andy Davidson
'An error occurred while calling {0}{1}{2}.\n'. --> 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o65.partitions. : java.lang.OutOfMemoryError: GC o

RE: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-07 Thread Sun, Rui
val Patel [mailto:dhaval1...@gmail.com] Sent: Saturday, November 7, 2015 12:26 AM To: Spark User Group Subject: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded I have been struggling through this error since past 3 days and have tried all possible ways/suggestion

[sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-06 Thread Dhaval Patel
cast_2_piece0 on localhost:39562 in memory (size: 2.4 KB, free: 530.0 MB) 15/11/06 10:45:20 INFO ContextCleaner: Cleaned accumulator 2 15/11/06 10:45:53 WARN ServletHandler: Error for /static/timeline-view.css java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.zip.Zip

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread Ted Yu
1.2.0 is quite old. You may want to try 1.5.1 which was released in the past week. Cheers > On Oct 4, 2015, at 4:26 AM, t_ras wrote: > > I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying > coutn action on a file. > > The file is a CSV file 217GB zi

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread t_ras
I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying coutn action on a file. The file is a CSV file 217GB zise Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0 configutation: spark.app.id:local-1443956477103 spark.app.name:Spark shell spark.cores.max

Re: Spark executor lost because of GC overhead limit exceeded even though using 20 executors using 25GB each

2015-08-18 Thread Ted Yu
mad as I am new to Spark. Thanks in advance. > > WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373, > myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.spark.sql.types.UTF8String.toString

Spark executor lost because of GC overhead limit exceeded even though using 20 executors using 25GB each

2015-08-18 Thread unk1102
, Rpc client disassociated, shuffle not found etc Please help me solve this I am getting mad as I am new to Spark. Thanks in advance. WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373, myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded at

AW: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-08-11 Thread rene.pfitzner
endet: Samstag, 11. Juli 2015 03:58 An: Ted Yu; Robin East; user Betreff: Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g o

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
on spark.driver.memory, I'll keep testing (at 2g things >>>>>>> seem >>>>>>> OK for now). >>>>>>> >>>>>>> On Mon, Aug 10, 2015 at 12:10 PM, Cody Koeninger >>>>>> > wrote: >>&g

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
gt;> >>>>>> On Mon, Aug 10, 2015 at 12:10 PM, Cody Koeninger >>>>>> wrote: >>>>>> >>>>>>> That looks like it's during recovery from a checkpoint, so it'd be >>>>>>> driver memory n

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
t;>> wrote: >>>>> >>>>>> That looks like it's during recovery from a checkpoint, so it'd be >>>>>> driver memory not executor memory. >>>>>> >>>>>> How big is the checkpoint directory that you're tryin

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
from? >>>>> >>>>> On Mon, Aug 10, 2015 at 10:57 AM, Dmitry Goldenberg < >>>>> dgoldenberg...@gmail.com> wrote: >>>>> >>>>>> We're getting the below error. Tried increasing >>>>>&g

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Ted Yu
hat you're trying to restore from? >>>>> >>>>> On Mon, Aug 10, 2015 at 10:57 AM, Dmitry Goldenberg < >>>>> dgoldenberg...@gmail.com> wrote: >>>>> >>>>>> We're getting the below error. Tried increasing

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
> How big is the checkpoint directory that you're trying to restore from? >>>> >>>> On Mon, Aug 10, 2015 at 10:57 AM, Dmitry Goldenberg < >>>> dgoldenberg...@gmail.com> wrote: >>>> >>>>> We're getting the below error. Trie

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
from? >>> >>> On Mon, Aug 10, 2015 at 10:57 AM, Dmitry Goldenberg < >>> dgoldenberg...@gmail.com> wrote: >>> >>>> We're getting the below error. Tried increasing spark.executor.memory >>>> e.g. from 1g to 2g but the below

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Ted Yu
ry Goldenberg < >> dgoldenberg...@gmail.com> wrote: >> >>> We're getting the below error. Tried increasing spark.executor.memory >>> e.g. from 1g to 2g but the below error still happens. >>> >>> Any recommendations? Something to do with sp

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
t; wrote: > >> We're getting the below error. Tried increasing spark.executor.memory >> e.g. from 1g to 2g but the below error still happens. >> >> Any recommendations? Something to do with specifying -Xmx in the submit >> job scripts? >> >> Th

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
27;re getting the below error. Tried increasing spark.executor.memory > e.g. from 1g to 2g but the below error still happens. > > Any recommendations? Something to do with specifying -Xmx in the submit > job scripts? > > Thanks. > > Exception in thread "main"

How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
We're getting the below error. Tried increasing spark.executor.memory e.g. from 1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread "main" java.lang.OutOfMemoryError: GC

Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
html and >> in the reduce phase we keep the html that has the shortest URL. However, >> after running for 2-3 hours the application crashes due to memory issue. >> Here is the exception: >> >> 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage

Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Ted Yu
-1 signature from the html and in > the reduce phase we keep the html that has the shortest URL. However, after > running for 2-3 hours the application crashes due to memory issue. Here is > the exception: > > 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stag

Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
ep the html that has the shortest URL. However, after running for 2-3 hours the application crashes due to memory issue. Here is the exception: 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage 0.0 (TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-10 Thread Roman Sokolov
Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g option), but with workers I have errors. So I run spark shell: ./bin/spark-shell --master spark://192.168.0.31:7077 --executor-memory 6900m --driver-memory 15g and workers (

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Yep, I already found it. So I added 1 line: val graph = GraphLoader.edgeListFile(sc, "", ...) val newgraph = graph.convertToCanonicalEdges() and could successfully count triangles on "newgraph". Next will test it on bigger (several Gb) networks. I am using Spark 1.3 and 1.4 but haven't seen

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Ted Yu
See SPARK-4917 which went into Spark 1.3.0 On Fri, Jun 26, 2015 at 2:27 AM, Robin East wrote: > You’ll get this issue if you just take the first 2000 lines of that file. > The problem is triangleCount() expects srdId < dstId which is not the case > in the file (e.g. vertex 28). You can get round

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Robin East
You’ll get this issue if you just take the first 2000 lines of that file. The problem is triangleCount() expects srdId < dstId which is not the case in the file (e.g. vertex 28). You can get round this by calling graph.convertToCanonical Edges() which removes bi-directional edges and ensures sr

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Ok, but what does it means? I did not change the core files of spark, so is it a bug there? PS: on small datasets (<500 Mb) I have no problem. Am 25.06.2015 18:02 schrieb "Ted Yu" : > The assertion failure from TriangleCount.scala corresponds with the > following lines: > > g.outerJoinVertices

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Ted Yu
The assertion failure from TriangleCount.scala corresponds with the following lines: g.outerJoinVertices(counters) { (vid, _, optCounter: Option[Int]) => val dblCount = optCounter.getOrElse(0) // double count should be even (divisible by two) assert((dblCount & 1)

Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Roman Sokolov
Hello! I am trying to compute number of triangles with GraphX. But get memory error or heap size, even though the dataset is very small (1Gb). I run the code in spark-shell, having 16Gb RAM machine (also tried with 2 workers on separate machines 8Gb RAM each). So I have 15x more memory than the dat

Re: Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread Deng Ching-Mallete
ve a Spark job that throws "java.lang.OutOfMemoryError: GC overhead > limit exceeded". > > The job is trying to process a filesize 4.5G. > > I've tried following spark configuration: > > --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G > > I tried i

Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread diplomatic Guru
Hello All, I have a Spark job that throws "java.lang.OutOfMemoryError: GC overhead limit exceeded". The job is trying to process a filesize 4.5G. I've tried following spark configuration: --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G I tried

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Patrick Wendell
essed in a task, >>>>>>>>> which trigger the problem. >>>>>>>>> >>>>>>>>> How many files do your dataset have and how large is a file? Seems >>>>>>>>> your query will be executed with two stages, tab

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Martin Goodson
t;>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yin >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 11:42 AM, Yiannis Gkoufas < >>>>>

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Yiannis Gkoufas
t;>>> Yin >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 11:42 AM, Yiannis Gkoufas < >>>>>>> johngou...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi there, I

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-20 Thread Yin Huai
t;>>>>>> On 18 March 2015 at 13:59, Cheng Lian wrote: >>>>>>> >>>>>>>> You should probably increase executor memory by setting >>>>>>>> "spark.executor.memory". >>>>>>>>

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-20 Thread Yiannis Gkoufas
;>>>>>> >>>>>>> Full list of available configurations can be found here >>>>>>> http://spark.apache.org/docs/latest/configuration.html >>>>>>> >>>>>>> Cheng >>>>>>> >>&g

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-20 Thread Yiannis Gkoufas
>>>>>> >>>>>> >>>>>> On 3/18/15 9:15 PM, Yiannis Gkoufas wrote: >>>>>> >>>>>>> Hi there, >>>>>>> >>>>>>> I was trying the new DataFrame API with some basic operations o

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-19 Thread Yiannis Gkoufas
t;>> On 3/18/15 9:15 PM, Yiannis Gkoufas wrote: >>>>> >>>>>> Hi there, >>>>>> >>>>>> I was trying the new DataFrame API with some basic operations on a >>>>>> parquet dataset. >>>>>> I have

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-19 Thread Yin Huai
gt;>> >>>>> Hi there, >>>>> >>>>> I was trying the new DataFrame API with some basic operations on a >>>>> parquet dataset. >>>>> I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a >&g

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Yiannis Gkoufas
t;>>> parquet dataset. >>>> I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a >>>> standalone cluster mode. >>>> The code is the following: >>>> >>>> val people = sqlContext.parquetFile(&quo

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Yin Huai
standalone cluster mode. >>> The code is the following: >>> >>> val people = sqlContext.parquetFile("/data.parquet"); >>> val res = people.groupBy("name","date").agg(sum("power"),sum("supply&qu

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Yiannis Gkoufas
M allocated to each worker in a >> standalone cluster mode. >> The code is the following: >> >> val people = sqlContext.parquetFile("/data.parquet"); >> val res = people.groupBy("name","date").agg(sum("power"),sum("supply&quo

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Cheng Lian
date").agg(sum("power"),sum("supply")).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer s

DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Yiannis Gkoufas
le.groupBy("name","date").agg(sum("power"),sum("supply")).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serial

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-02 Thread Sabarish Sasidharan
Thanks Debasish, Reza and Pat. In my case, I am doing an SVD and then doing the similarities computation. So a rowSimiliarities() would be a good fit, looking forward to it. In the meanwhile I will try to see if I can further limit the number of similarities computed through some other fashion or

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-02 Thread Pat Ferrel
Sab, not sure what you require for the similarity metric or your use case but you can also look at spark-rowsimilarity or spark-itemsimilarity (column-wise) here http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-02 Thread Reza Zadeh
Hi Sab, The current method is optimized for having many rows and few columns. In your case it is exactly the opposite. We are working on your case, tracked by this JIRA: https://issues.apache.org/jira/browse/SPARK-4823 Your case is very common, so I will put some time into building it. In the mean

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Debasish Das
Column based similarities work well if the columns are mild (10K, 100K, we actually scaled it to 1.5M columns but it really stress tests the shuffle and it needs to tune the shuffle parameters)...You can either use dimsum sampling or come up with your own threshold based on your application that yo

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Sabarish Sasidharan
​Hi Reza ​​ I see that ((int, int), double) pairs are generated for any combination that meets the criteria controlled by the threshold. But assuming a simple 1x10K matrix that means I would need atleast 12GB memory per executor for the flat map just for these pairs excluding any other overhead. Is

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Reza Zadeh
Hi Sab, In this dense case, the output will contain 1 x 1 entries, i.e. 100 million doubles, which doesn't fit in 1GB with overheads. For a dense matrix, similarColumns() scales quadratically in the number of columns, so you need more memory across the cluster. Reza On Sun, Mar 1, 2015 at

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Sabarish Sasidharan
Sorry, I actually meant 30 x 1 matrix (missed a 0) Regards Sab

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Reza Zadeh
ll in one > partition. I am running on a single node of 15G and giving the driver 1G > and the executor 9G. This is on a single node hadoop. In the first attempt > the BlockManager doesn't respond within the heart beat interval. In the > second attempt I am seeing a GC overhead li

Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Sabarish Sasidharan
BlockManager doesn't respond within the heart beat interval. In the second attempt I am seeing a GC overhead limit exceeded error. And it is almost always in the RowMatrix.columSimilaritiesDIMSUM -> mapPartitionsWithIndex (line 570) java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: loads of memory still GC overhead limit exceeded

2015-02-20 Thread Ilya Ganelin
; > This is referenced in a JIRA I can't find at the moment. > > On Thu, Feb 19, 2015 at 5:10 AM Antony Mayi > > > wrote: > > > > now with reverted spark.shuffle.io.preferDirectBufs (to true) getting > again > > GC overhead limit exceeded: >

Re: loads of memory still GC overhead limit exceeded

2015-02-20 Thread Xiangrui Meng
:10 AM Antony Mayi > wrote: > > now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again > GC overhead limit exceeded: > > === spark stdout === > 15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 > (TID 5329, 192.

Re: loads of memory still GC overhead limit exceeded

2015-02-20 Thread Antony Mayi
9, 2015 at 5:10 AM Antony Mayi wrote: now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC overhead limit exceeded: === spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Ilya Ganelin
e) getting > again GC overhead limit exceeded: > > === spark stdout === > 15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage > 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead > limit exceeded > at > java.io.ObjectInputStrea

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC overhead limit exceeded: === spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead limit exceeded        at

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
it is from within the ALS.trainImplicit() call. btw. the exception varies between this "GC overhead limit exceeded" and "Java heap space" (which I guess is just different outcome of same problem). just tried another run and here are the logs (filtered) - note I

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Sean Owen
gt;at >> >> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>at scala.concurrent.Await$.result(package.scala:107) >>at >> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in stage > 51.0 (TID 7259) > java.lang.OutOfMemoryError: GC overhead limit exceeded >        at java.lang.reflect.Array.newInstance(Array.java:75) >        at java.io.ObjectInputStream.readArray(ObjectInputStream.ja

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Sean Owen
org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187) > at > org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:398) > 15/02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in stage > 51.0 (TID 7259) > java.lang.OutOfMemoryError: GC overh

loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
)java.lang.OutOfMemoryError: GC overhead limit exceeded        at java.lang.reflect.Array.newInstance(Array.java:75)        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1671)        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)        at

Re: failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

2015-02-14 Thread Matthew Cornell
): java.lang.OutOfMemoryError: GC overhead limit exceeded 15/02/12 08:05:06 WARN TaskSetManager: Lost task 0.0 in stage 31.1 (TID 48, compute-0-2.wright): FetchFailed(BlockManagerId(0, wright.cs.umass.edu, 60837), shuffleId=0, mapId=1, reduceId=1, message= org.apache.spark.shuffle.FetchFailedException: Failed to

failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

2015-02-12 Thread Matthew Cornell
Hi Folks, I'm running a five-step path following-algorithm on a movie graph with 120K verticies and 400K edges. The graph has vertices for actors, directors, movies, users, and user ratings, and my Scala code is walking the path "rating > movie > rating > user > rating". There are 75K rating no

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-28 Thread Guru Medasani
bject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded I have yarn configured with yarn.nodemanager.vmem-check-enabled=false and yarn.nodemanager.pmem-check-enabled=false to avoid yarn killing the containers. the stack trace is bellow. thanks, Antony. 15/01/27 17:0

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
17:02:53 ERROR executor.Executor: Exception in task 21.0 in stage 12.0 (TID 1312)java.lang.OutOfMemoryError: GC overhead limit exceeded        at java.lang.Integer.valueOf(Integer.java:642)        at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70)        at

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
Can you attach the logs where this is failing? From: Sven Krasser Date: Tuesday, January 27, 2015 at 4:50 PM To: Guru Medasani Cc: Sandy Ryza , Antony Mayi , "user@spark.apache.org" Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Since it's an executor

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sven Krasser
e: Tuesday, January 27, 2015 at 3:33 PM > To: Antony Mayi > Cc: "user@spark.apache.org" > Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded > > Hi Antony, > > If you look in the YARN NodeManager logs, do you see that it's killing the > exec

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
: Tuesday, January 27, 2015 at 3:33 PM To: Antony Mayi Cc: "user@spark.apache.org" Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing for a d

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sandy Ryza
Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing for a different reason? -Sandy On Tue, Jan 27, 2015 at 12:43 PM, Antony Mayi wrote: > Hi, > > I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors > crashe

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
Hi, I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors crashed with this error. does that mean I have genuinely not enough RAM or is this matter of config tuning? other config options used:spark.storage.memoryFraction=0.3 SPARK_EXECUTOR_MEMORY=14G running spark 1.2.0 as yarn

Re: Spark-Shell: OOM: GC overhead limit exceeded

2014-10-08 Thread sranga
Increasing the driver memory resolved this issue. Thanks to Nick for the hint. Here is how I am starting the shell: "spark-shell --driver-memory 4g --driver-cores 4 --master local" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Shell-OOM-G

Spark-Shell: OOM: GC overhead limit exceeded

2014-10-07 Thread sranga
ryFraction 0.1 spark.default.parallelism 24 Any help is appreciated. The stack trace of the error is given below. - Ranga == Stack trace == java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3

Re: still "GC overhead limit exceeded" after increasing heap space

2014-10-05 Thread Andrew Ash
cutors. > >> > >> On Oct 1, 2014 9:37 PM, "anny9699" wrote: > >>> > >>> Hi, > >>> > >>> After reading some previous posts about this issue, I have increased > the > >>> java heap space to "-Xms64

Re: still "GC overhead limit exceeded" after increasing heap space

2014-10-02 Thread Sean Owen
fter reading some previous posts about this issue, I have increased the >>> java heap space to "-Xms64g -Xmx64g", but still met the >>> "java.lang.OutOfMemoryError: GC overhead limit exceeded" error. Does >>> anyone >>>

Re: still "GC overhead limit exceeded" after increasing heap space

2014-10-01 Thread Liquan Pei
emory is 120 GB, so I use >>> "MEMORY_AND_DISK_SER" and kryo serialization. >>> >>> Thanks a lot! >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/

Re: still "GC overhead limit exceeded" after increasing heap space

2014-10-01 Thread 陈韵竹
e, I have increased the >> java heap space to "-Xms64g -Xmx64g", but still met the >> "java.lang.OutOfMemoryError: GC overhead limit exceeded" error. Does >> anyone >> have other suggestions? >> >> I am reading a data of 200 GB and my

Re: still "GC overhead limit exceeded" after increasing heap space

2014-10-01 Thread Liquan Pei
8 workers, each with 15.7GB memory. > > What you said makes sense, but if I don't increase heap space, it keeps > telling me "GC overhead limit exceeded". > > Thanks! > Anny > > On Wed, Oct 1, 2014 at 1:41 PM, Liquan Pei [via Apache Spark User List] > <

Re: still "GC overhead limit exceeded" after increasing heap space

2014-10-01 Thread Sean Owen
e to "-Xms64g -Xmx64g", but still met the > "java.lang.OutOfMemoryError: GC overhead limit exceeded" error. Does anyone > have other suggestions? > > I am reading a data of 200 GB and my total memory is 120 GB, so I use > "MEMORY_AND_DISK_SER" and kryo se

Re: still "GC overhead limit exceeded" after increasing heap space

2014-10-01 Thread anny9699
Hi Liquan, I have 8 workers, each with 15.7GB memory. What you said makes sense, but if I don't increase heap space, it keeps telling me "GC overhead limit exceeded". Thanks! Anny On Wed, Oct 1, 2014 at 1:41 PM, Liquan Pei [via Apache Spark User List] < ml-node+s1001560n155

  1   2   >