Re: Looking for help about stackoverflow in spark

2016-06-30 Thread Chanh Le
Hi John,
I think it relates to drivers memory more than the others thing you said.

Can you just increase more memory for driver?




> On Jul 1, 2016, at 9:03 AM, johnzeng  wrote:
> 
> I am trying to load a 1 TB collection into spark cluster from mongo. But I am
> keep getting stack overflow error  after running for a while.
> 
> I have posted a question in stackoverflow.com, and tried all advies they
> have provide, nothing works...
> 
> how to load large database into spark
> <http://stackoverflow.com/questions/38096502/how-to-load-large-table-in-spark>
>   
> 
> I have tried:
> 1, use persist to make it MemoryAndDisk,  same error after running same
> time.
> 2, add more instance,  same error after running same time.
> 3, run this script on another collection which is much smaller, everything
> is good, so I think my codes are all right.
> 4, remove the reduce process, same error after running same time.
> 5, remove the map process,  same error after running same time.
> 6, change the sql I used, it's faster, but  same error after running shorter
> time.
> 7,retrieve "_id" instead of "u_at" and "c_at",  same error after running
> same time.
> 
> Anyone knows how many resources do I need to handle this 1TB database? I
> only retrieve two fields form it, and this field is only 1% of a
> document(because we have an array containing about 90+ embedded documents in
> it.)
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Looking-for-help-about-stackoverflow-in-spark-tp27255.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Looking for help about stackoverflow in spark

2016-06-30 Thread johnzeng
I am trying to load a 1 TB collection into spark cluster from mongo. But I am
keep getting stack overflow error  after running for a while.

I have posted a question in stackoverflow.com, and tried all advies they
have provide, nothing works...

how to load large database into spark
<http://stackoverflow.com/questions/38096502/how-to-load-large-table-in-spark>  

I have tried:
1, use persist to make it MemoryAndDisk,  same error after running same
time.
2, add more instance,  same error after running same time.
3, run this script on another collection which is much smaller, everything
is good, so I think my codes are all right.
4, remove the reduce process, same error after running same time.
5, remove the map process,  same error after running same time.
6, change the sql I used, it's faster, but  same error after running shorter
time.
7,retrieve "_id" instead of "u_at" and "c_at",  same error after running
same time.

Anyone knows how many resources do I need to handle this 1TB database? I
only retrieve two fields form it, and this field is only 1% of a
document(because we have an array containing about 90+ embedded documents in
it.)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Looking-for-help-about-stackoverflow-in-spark-tp27255.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: StackOverflow in Spark

2016-06-13 Thread Terry Hoo
Maybe the same issue with SPARK_6847
<https://issues.apache.org/jira/browse/SPARK-6847>, which has been fixed in
spark 2.0

Regards
- Terry

On Mon, Jun 13, 2016 at 3:15 PM, Michel Hubert  wrote:

>
>
> I’ve found my problem.
>
>
>
> I’ve got a DAG with two consecutive “updateStateByKey” functions .
>
> When I only process (map/foreachRDD/JavaEsSpark) the state of the last
> “updateStateByKey” function, I get an stackoverflow after a while (too long
> linage).
>
>
>
> But when I also do some processing (foreachRDD/rdd.take) on the first
> “updatestatebykey”, then there is no problem.
>
>
>
> Does this make sense? Probably the “long linage” problem.
>
>
>
> But why should I have such a “linage problem” when Sparks claims to be a
> “abstract/high level” architecture? Why should I be worried about “long
> linage”? Its seems a contradiction with the abstract/high level (functional
> programming) approach when I have to know/consider how Spark doest it.
>
>
>
>
>
>
>
> *Van:* Rishabh Wadhawan [mailto:rishabh...@gmail.com]
> *Verzonden:* donderdag 2 juni 2016 06:06
> *Aan:* Yash Sharma 
> *CC:* Ted Yu ; Matthew Young ;
> Michel Hubert ; user@spark.apache.org
> *Onderwerp:* Re: StackOverflow in Spark
>
>
>
> Stackoverflow is generated when DAG is too log as there are many
> transformations in lot of iterations. Please use checkpointing to store the
> DAG and break the linage to get away from this stack overflow error. Look
> into checkpoint fuction.
>
> Thanks
>
> Hope it helps. Let me know if you need anymore help.
>
> On Jun 1, 2016, at 8:18 PM, Yash Sharma  wrote:
>
>
>
> Not sure if its related, But I got a similar stack overflow error some
> time back while reading files and converting them to parquet.
>
>
>
>
>
>
> Stack trace-
> 16/06/02 02:23:54 INFO YarnAllocator: Driver requested a total number of
> 32769 executor(s).
> 16/06/02 02:23:54 INFO ExecutorAllocationManager: Requesting 16384 new
> executors because tasks are backlogged (new desired total will be 32769)
> 16/06/02 02:23:54 INFO YarnAllocator: Will request 24576 executor
> containers, each with 5 cores and 22528 MB memory including 2048 MB overhead
> 16/06/02 02:23:55 WARN ApplicationMaster: Reporter thread fails 2 time(s)
> in a row.
> at
> scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
> at
> scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
> at
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>
> java.lang.StackOverflowError
>
>
>
>
>
> On Thu, Jun 2, 2016 at 12:58 PM, Ted Yu  wrote:
>
> Looking at Michel's stack trace, it seems to be different issue.
>
>
> On Jun 1, 2016, at 7:45 PM, Matthew Young  wrote:
>
> Hi,
>
>
>
> It's related to the one fixed bug in Spark, jira ticket SPARK-6847
> <https://issues.apache.org/jira/browse/SPARK-6847>
>
>
>
> Matthew Yang
>
>
>
> On Wed, May 25, 2016 at 7:48 PM, Michel Hubert  wrote:
>
>
>
> Hi,
>
>
>
>
>
> I have an Spark application which generates StackOverflowError exceptions
> after 30+ min.
>
>
>
> Anyone any ideas?
>
>
>
>
>
>
>
>
>
>
>
>
>
> 16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
> 55449.0 (TID 5584, host81440-cld.opentsp.com):
> java.lang.StackOverflowError
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>
> ·at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>
> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>
> ·at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> ·at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>
> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>
> ·at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java

RE: StackOverflow in Spark

2016-06-13 Thread Michel Hubert

I’ve found my problem.

I’ve got a DAG with two consecutive “updateStateByKey” functions .
When I only process (map/foreachRDD/JavaEsSpark) the state of the last 
“updateStateByKey” function, I get an stackoverflow after a while (too long 
linage).

But when I also do some processing (foreachRDD/rdd.take) on the first 
“updatestatebykey”, then there is no problem.

Does this make sense? Probably the “long linage” problem.

But why should I have such a “linage problem” when Sparks claims to be a 
“abstract/high level” architecture? Why should I be worried about “long 
linage”? Its seems a contradiction with the abstract/high level (functional 
programming) approach when I have to know/consider how Spark doest it.



Van: Rishabh Wadhawan [mailto:rishabh...@gmail.com]
Verzonden: donderdag 2 juni 2016 06:06
Aan: Yash Sharma 
CC: Ted Yu ; Matthew Young ; Michel 
Hubert ; user@spark.apache.org
Onderwerp: Re: StackOverflow in Spark

Stackoverflow is generated when DAG is too log as there are many 
transformations in lot of iterations. Please use checkpointing to store the DAG 
and break the linage to get away from this stack overflow error. Look into 
checkpoint fuction.
Thanks
Hope it helps. Let me know if you need anymore help.
On Jun 1, 2016, at 8:18 PM, Yash Sharma 
mailto:yash...@gmail.com>> wrote:

Not sure if its related, But I got a similar stack overflow error some time 
back while reading files and converting them to parquet.



Stack trace-
16/06/02 02:23:54 INFO YarnAllocator: Driver requested a total number of 32769 
executor(s).
16/06/02 02:23:54 INFO ExecutorAllocationManager: Requesting 16384 new 
executors because tasks are backlogged (new desired total will be 32769)
16/06/02 02:23:54 INFO YarnAllocator: Will request 24576 executor containers, 
each with 5 cores and 22528 MB memory including 2048 MB overhead
16/06/02 02:23:55 WARN ApplicationMaster: Reporter thread fails 2 time(s) in a 
row.
at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
at 
scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
java.lang.StackOverflowError


On Thu, Jun 2, 2016 at 12:58 PM, Ted Yu 
mailto:yuzhih...@gmail.com>> wrote:
Looking at Michel's stack trace, it seems to be different issue.

On Jun 1, 2016, at 7:45 PM, Matthew Young 
mailto:taige...@gmail.com>> wrote:
Hi,

It's related to the one fixed bug in Spark, jira ticket 
SPARK-6847<https://issues.apache.org/jira/browse/SPARK-6847>

Matthew Yang

On Wed, May 25, 2016 at 7:48 PM, Michel Hubert 
mailto:mich...@phact.nl>> wrote:

Hi,


I have an Spark application which generates StackOverflowError exceptions after 
30+ min.

Anyone any ideas?






16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 55449.0 
(TID 5584, host81440-cld.opentsp.com<http://host81440-cld.opentsp.com/>): 
java.lang.StackOverflowError
·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
·at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
·at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
·at java.lang.reflect.Method.invoke(Method.java:606)
·at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
·at java.io.ObjectInputStream.

Re: StackOverflow in Spark

2016-06-01 Thread Rishabh Wadhawan
Stackoverflow is generated when DAG is too log as there are many 
transformations in lot of iterations. Please use checkpointing to store the DAG 
and break the linage to get away from this stack overflow error. Look into 
checkpoint fuction.
Thanks
Hope it helps. Let me know if you need anymore help.
> On Jun 1, 2016, at 8:18 PM, Yash Sharma  wrote:
> 
> Not sure if its related, But I got a similar stack overflow error some time 
> back while reading files and converting them to parquet.
> 
> 
> 
> Stack trace-
> 16/06/02 02:23:54 INFO YarnAllocator: Driver requested a total number of 
> 32769 executor(s).
> 16/06/02 02:23:54 INFO ExecutorAllocationManager: Requesting 16384 new 
> executors because tasks are backlogged (new desired total will be 32769)
> 16/06/02 02:23:54 INFO YarnAllocator: Will request 24576 executor containers, 
> each with 5 cores and 22528 MB memory including 2048 MB overhead
> 16/06/02 02:23:55 WARN ApplicationMaster: Reporter thread fails 2 time(s) in 
> a row.
> at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
> at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
> at 
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
> at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> java.lang.StackOverflowError
> 
> 
> On Thu, Jun 2, 2016 at 12:58 PM, Ted Yu  > wrote:
> Looking at Michel's stack trace, it seems to be different issue. 
> 
> On Jun 1, 2016, at 7:45 PM, Matthew Young  > wrote:
> 
>> Hi,
>> 
>> It's related to the one fixed bug in Spark, jira ticket SPARK-6847 
>> 
>> 
>> Matthew Yang
>> 
>> On Wed, May 25, 2016 at 7:48 PM, Michel Hubert > > wrote:
>>  
>> 
>> Hi,
>> 
>>  
>> 
>>  
>> 
>> I have an Spark application which generates StackOverflowError exceptions 
>> after 30+ min.
>> 
>>  
>> 
>> Anyone any ideas?
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> 16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 
>> 55449.0 (TID 5584, host81440-cld.opentsp.com 
>> ): java.lang.StackOverflowError
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> 
>> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>> 
>> ·at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>> 
>> ·at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 
>> ·at java.lang.reflect.Method.invoke(Method.java:606)
>> 
>> ·at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> 
>> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>> 
>> at sun.reflect.Ge

Re: StackOverflow in Spark

2016-06-01 Thread Yash Sharma
Not sure if its related, But I got a similar stack overflow error some time
back while reading files and converting them to parquet.



> Stack trace-
> 16/06/02 02:23:54 INFO YarnAllocator: Driver requested a total number of
> 32769 executor(s).
> 16/06/02 02:23:54 INFO ExecutorAllocationManager: Requesting 16384 new
> executors because tasks are backlogged (new desired total will be 32769)
> 16/06/02 02:23:54 INFO YarnAllocator: Will request 24576 executor
> containers, each with 5 cores and 22528 MB memory including 2048 MB overhead
> 16/06/02 02:23:55 WARN ApplicationMaster: Reporter thread fails 2 time(s)
> in a row.
> at
> scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
> at
> scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
> at
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

java.lang.StackOverflowError
>


On Thu, Jun 2, 2016 at 12:58 PM, Ted Yu  wrote:

> Looking at Michel's stack trace, it seems to be different issue.
>
> On Jun 1, 2016, at 7:45 PM, Matthew Young  wrote:
>
> Hi,
>
> It's related to the one fixed bug in Spark, jira ticket SPARK-6847
> 
>
> Matthew Yang
>
> On Wed, May 25, 2016 at 7:48 PM, Michel Hubert  wrote:
>
>>
>>
>> Hi,
>>
>>
>>
>>
>>
>> I have an Spark application which generates StackOverflowError exceptions
>> after 30+ min.
>>
>>
>>
>> Anyone any ideas?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
>> 55449.0 (TID 5584, host81440-cld.opentsp.com):
>> java.lang.StackOverflowError
>>
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>>
>> ·at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>
>> ·at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> ·at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>
>> ·at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>
>> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>
>> ·at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>>
>> ·at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> ·at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> ·at
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>>
>> ·at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> ·at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>
>> ·at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> ·at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>
>> ·at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>
>> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>
>> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown
>> Source)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Driver stacktrace:
>>
>> ·at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>>
>> ·at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>>
>> ·at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>>
>> ·at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

Re: StackOverflow in Spark

2016-06-01 Thread Ted Yu
Looking at Michel's stack trace, it seems to be different issue. 

> On Jun 1, 2016, at 7:45 PM, Matthew Young  wrote:
> 
> Hi,
> 
> It's related to the one fixed bug in Spark, jira ticket SPARK-6847
> 
> Matthew Yang
> 
>> On Wed, May 25, 2016 at 7:48 PM, Michel Hubert  wrote:
>>  
>> 
>> Hi,
>> 
>>  
>> 
>>  
>> 
>> I have an Spark application which generates StackOverflowError exceptions 
>> after 30+ min.
>> 
>>  
>> 
>> Anyone any ideas?
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> 16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 
>> 55449.0 (TID 5584, host81440-cld.opentsp.com): java.lang.StackOverflowError
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> 
>> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>> 
>> ·at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>> 
>> ·at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 
>> ·at java.lang.reflect.Method.invoke(Method.java:606)
>> 
>> ·at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> 
>> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> 
>> ·at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> 
>> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> 
>> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> 
>> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>> 
>> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown 
>> Source)
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> Driver stacktrace:
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>> 
>> ·at 
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> 
>> ·at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>> 
>> ·at scala.Option.foreach(Option.scala:236)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>> 
>> ·at 
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>> 
>> ·at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> 
>> ·at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>> 
>> ·at org.apache.spark.SparkContext.runJob(SparkContext.scala:1843)
>> 
>> ·at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
>> 
>> ·at org.apache.spark.SparkContext.runJob(SparkContext.scala:1933)
>> 
>> ·at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:67)
>> 
>> ·at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:54)
>> 
>> ·at org.elasticsearch.spark.rdd.EsSpark$.saveJsonToEs(EsSpark.scala:

Re: StackOverflow in Spark

2016-06-01 Thread Matthew Young
Hi,

It's related to the one fixed bug in Spark, jira ticket SPARK-6847


Matthew Yang

On Wed, May 25, 2016 at 7:48 PM, Michel Hubert  wrote:

>
>
> Hi,
>
>
>
>
>
> I have an Spark application which generates StackOverflowError exceptions
> after 30+ min.
>
>
>
> Anyone any ideas?
>
>
>
>
>
>
>
>
>
>
>
>
>
> 16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
> 55449.0 (TID 5584, host81440-cld.opentsp.com):
> java.lang.StackOverflowError
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>
> ·at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>
> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>
> ·at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> ·at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>
> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>
> ·at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>
> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>
> ·at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>
> ·at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> ·at java.lang.reflect.Method.invoke(Method.java:606)
>
> ·at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>
> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>
> ·at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> ·at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>
> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>
> ·at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> ·at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>
> ·at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>
> ·at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> ·at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> ·at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>
> ·at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown
> Source)
>
>
>
>
>
>
>
>
>
> Driver stacktrace:
>
> ·at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
> ·at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>
> ·at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>
> ·at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> ·at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> ·at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>
> ·at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
> ·at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
> ·at scala.Option.foreach(Option.scala:236)
>
> ·at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>
> ·at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>
> ·at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>
> ·at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>
> ·at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> ·at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>
> ·at org.apache.spark.SparkContext.runJob(SparkContext.scala:1843)
>
> ·at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
>
> ·at org.apache.spark.SparkContext.runJob(SparkContext.scala:1933)
>
> ·at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:67)
>
> ·at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:54)
>
> ·at org.elasticsearch.spark.rdd.EsSpark$.saveJsonToEs(EsSpark.scala:90)
>
> ·at
> org.elasticsearch.spark.rdd.api.java.JavaEsSpark$.saveJsonToEs(JavaEsSpark.scala:62)
>
> at
> org.elasticsearch.spark.rdd.api.java.JavaEsSpark.saveJsonToEs(JavaEsSpark.scala)
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spa

StackOverflow in Spark

2016-05-25 Thread Michel Hubert

Hi,


I have an Spark application which generates StackOverflowError exceptions after 
30+ min.

Anyone any ideas?

Seems like problems with deserialization of checkpoint data?





16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 55449.0 
(TID 5584, host81440-cld.opentsp.com): java.lang.StackOverflowError
*at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
*at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
*at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
*at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
*at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
*at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
*at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
*at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
*at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
*at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
*at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
*at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
*at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
*at java.lang.reflect.Method.invoke(Method.java:606)
*at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
*at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
*at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
*at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
*at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
*at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
*at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
*at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
*at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
*at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
*at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
*at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
*at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
*at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)




Driver stacktrace:
*at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
*at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
*at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
*at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
*at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
*at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
*at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
*at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
*at scala.Option.foreach(Option.scala:236)
*at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
*at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
*at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
*at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
*at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
*at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
*at org.apache.spark.SparkContext.runJob(SparkContext.scala:1843)
*at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
*at org.apache.spark.SparkContext.runJob(SparkContext.scala:1933)
*at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:67)
*at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:54)
*at org.elasticsearch.spark.rdd.EsSpark$.saveJsonToEs(EsSpark.scala:90)
*at 
org.elasticsearch.spark.rdd.api.java.JavaEsSpark$.saveJsonToEs(JavaEsSpark.scala:62)
at 
org.elasticsearch.spark.rdd.api.java.JavaEsSpark.saveJsonToEs(JavaEsSpark.scala)