Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Akhil Das
Yep, and it works fine for operations which does not involve any shuffle
(like foreach,, count etc) and those which involves shuffle operations ends
up in an infinite loop. Spark should somehow indicate this instead of going
in an infinite loop.

Thanks
Best Regards

On Thu, Aug 13, 2015 at 11:37 PM, Imran Rashid iras...@cloudera.com wrote:

 oh I see, you are defining your own RDD  Partition types, and you had a
 bug where partition.index did not line up with the partitions slot in
 rdd.getPartitions.  Is that correct?

 On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 I figured that out, And these are my findings:

 - It just enters in an infinite loop when there's a duplicate partition
 id.

 - It enters in an infinite loop when the partition id starts from 1
 rather than 0


 Something like this piece of code can reproduce it: (in getPartitions())

 val total_partitions = 4
 val partitionsArray: Array[Partition] =
 Array.ofDim[Partition](total_partitions)

 var i = 0

 for(outer - 0 to 1){
   for(partition - 1 to total_partitions){
 partitionsArray(i) = new DeadLockPartitions(partition)
 i = i + 1
   }
 }

 partitionsArray




 Thanks
 Best Regards

 On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid iras...@cloudera.com
 wrote:

 yikes.

 Was this a one-time thing?  Or does it happen consistently?  can you
 turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe
 ...)

 On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Hi

 My Spark job (running in local[*] with spark 1.4.1) reads data from a
 thrift server(Created an RDD, it will compute the partitions in
 getPartitions() call and in computes hasNext will return records from these
 partitions), count(), foreach() is working fine it returns the correct
 number of records. But whenever there is shuffleMap stage (like reduceByKey
 etc.) then all the tasks are executing properly but it enters in an
 infinite loop saying :


1. 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting
ShuffleMapStage 1 (map at FilterMain.scala:59) because some of its
tasks had failed: 0, 3


 Here's the complete stack-trace http://pastebin.com/hyK7cG8S

 What could be the root cause of this problem? I looked up and bumped
 into this closed JIRA https://issues.apache.org/jira/browse/SPARK-583
 (which is very very old)




 Thanks
 Best Regards







Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Mridul Muralidharan
What I understood from Imran's mail (and what was referenced in his
mail) the RDD mentioned seems to be violating some basic contracts on
how partitions are used in spark [1].
They cannot be arbitrarily numbered,have duplicates, etc.


Extending RDD to add functionality is typically for niche cases; and
requires subclasses to adhere to the explicit (and implicit)
contracts/lifecycles for them.
Using existing RDD's as template would be a good idea for
customizations - one way to look at it is, using RDD is more in api
space but extending them is more in spi space.

Violations would actually not even be detectable by spark-core in general case.


Regards,
Mridul

[1] Ignoring the array out of bounds, etc - I am assuming the intent
is to show overlapping partitions, duplicates. index to partition
mismatch - that sort of thing.


On Thu, Aug 13, 2015 at 11:42 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
 Yep, and it works fine for operations which does not involve any shuffle
 (like foreach,, count etc) and those which involves shuffle operations ends
 up in an infinite loop. Spark should somehow indicate this instead of going
 in an infinite loop.

 Thanks
 Best Regards

 On Thu, Aug 13, 2015 at 11:37 PM, Imran Rashid iras...@cloudera.com wrote:

 oh I see, you are defining your own RDD  Partition types, and you had a
 bug where partition.index did not line up with the partitions slot in
 rdd.getPartitions.  Is that correct?

 On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 I figured that out, And these are my findings:

 - It just enters in an infinite loop when there's a duplicate partition
 id.

 - It enters in an infinite loop when the partition id starts from 1
 rather than 0


 Something like this piece of code can reproduce it: (in getPartitions())

 val total_partitions = 4
 val partitionsArray: Array[Partition] =
 Array.ofDim[Partition](total_partitions)

 var i = 0

 for(outer - 0 to 1){
   for(partition - 1 to total_partitions){
 partitionsArray(i) = new DeadLockPartitions(partition)
 i = i + 1
   }
 }

 partitionsArray




 Thanks
 Best Regards

 On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid iras...@cloudera.com
 wrote:

 yikes.

 Was this a one-time thing?  Or does it happen consistently?  can you
 turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe
 ...)

 On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Hi

 My Spark job (running in local[*] with spark 1.4.1) reads data from a
 thrift server(Created an RDD, it will compute the partitions in
 getPartitions() call and in computes hasNext will return records from 
 these
 partitions), count(), foreach() is working fine it returns the correct
 number of records. But whenever there is shuffleMap stage (like 
 reduceByKey
 etc.) then all the tasks are executing properly but it enters in an 
 infinite
 loop saying :

 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting ShuffleMapStage 1
 (map at FilterMain.scala:59) because some of its tasks had failed: 0, 3


 Here's the complete stack-trace http://pastebin.com/hyK7cG8S

 What could be the root cause of this problem? I looked up and bumped
 into this closed JIRA (which is very very old)




 Thanks
 Best Regards






-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Akhil Das
Thanks for the clarifications Mrithul.

Thanks
Best Regards

On Fri, Aug 14, 2015 at 1:04 PM, Mridul Muralidharan mri...@gmail.com
wrote:

 What I understood from Imran's mail (and what was referenced in his
 mail) the RDD mentioned seems to be violating some basic contracts on
 how partitions are used in spark [1].
 They cannot be arbitrarily numbered,have duplicates, etc.


 Extending RDD to add functionality is typically for niche cases; and
 requires subclasses to adhere to the explicit (and implicit)
 contracts/lifecycles for them.
 Using existing RDD's as template would be a good idea for
 customizations - one way to look at it is, using RDD is more in api
 space but extending them is more in spi space.

 Violations would actually not even be detectable by spark-core in general
 case.


 Regards,
 Mridul

 [1] Ignoring the array out of bounds, etc - I am assuming the intent
 is to show overlapping partitions, duplicates. index to partition
 mismatch - that sort of thing.


 On Thu, Aug 13, 2015 at 11:42 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:
  Yep, and it works fine for operations which does not involve any shuffle
  (like foreach,, count etc) and those which involves shuffle operations
 ends
  up in an infinite loop. Spark should somehow indicate this instead of
 going
  in an infinite loop.
 
  Thanks
  Best Regards
 
  On Thu, Aug 13, 2015 at 11:37 PM, Imran Rashid iras...@cloudera.com
 wrote:
 
  oh I see, you are defining your own RDD  Partition types, and you had a
  bug where partition.index did not line up with the partitions slot in
  rdd.getPartitions.  Is that correct?
 
  On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das ak...@sigmoidanalytics.com
  wrote:
 
  I figured that out, And these are my findings:
 
  - It just enters in an infinite loop when there's a duplicate
 partition
  id.
 
  - It enters in an infinite loop when the partition id starts from 1
  rather than 0
 
 
  Something like this piece of code can reproduce it: (in
 getPartitions())
 
  val total_partitions = 4
  val partitionsArray: Array[Partition] =
  Array.ofDim[Partition](total_partitions)
 
  var i = 0
 
  for(outer - 0 to 1){
for(partition - 1 to total_partitions){
  partitionsArray(i) = new DeadLockPartitions(partition)
  i = i + 1
}
  }
 
  partitionsArray
 
 
 
 
  Thanks
  Best Regards
 
  On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid iras...@cloudera.com
  wrote:
 
  yikes.
 
  Was this a one-time thing?  Or does it happen consistently?  can you
  turn on debug logging for o.a.s.scheduler (dunno if it will help, but
 maybe
  ...)
 
  On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das 
 ak...@sigmoidanalytics.com
  wrote:
 
  Hi
 
  My Spark job (running in local[*] with spark 1.4.1) reads data from a
  thrift server(Created an RDD, it will compute the partitions in
  getPartitions() call and in computes hasNext will return records
 from these
  partitions), count(), foreach() is working fine it returns the
 correct
  number of records. But whenever there is shuffleMap stage (like
 reduceByKey
  etc.) then all the tasks are executing properly but it enters in an
 infinite
  loop saying :
 
  15/08/11 13:05:54 INFO DAGScheduler: Resubmitting ShuffleMapStage 1
  (map at FilterMain.scala:59) because some of its tasks had failed:
 0, 3
 
 
  Here's the complete stack-trace http://pastebin.com/hyK7cG8S
 
  What could be the root cause of this problem? I looked up and bumped
  into this closed JIRA (which is very very old)
 
 
 
 
  Thanks
  Best Regards
 
 
 
 
 



Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-13 Thread Imran Rashid
oh I see, you are defining your own RDD  Partition types, and you had a
bug where partition.index did not line up with the partitions slot in
rdd.getPartitions.  Is that correct?

On Thu, Aug 13, 2015 at 2:40 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 I figured that out, And these are my findings:

 - It just enters in an infinite loop when there's a duplicate partition
 id.

 - It enters in an infinite loop when the partition id starts from 1
 rather than 0


 Something like this piece of code can reproduce it: (in getPartitions())

 val total_partitions = 4
 val partitionsArray: Array[Partition] =
 Array.ofDim[Partition](total_partitions)

 var i = 0

 for(outer - 0 to 1){
   for(partition - 1 to total_partitions){
 partitionsArray(i) = new DeadLockPartitions(partition)
 i = i + 1
   }
 }

 partitionsArray




 Thanks
 Best Regards

 On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid iras...@cloudera.com
 wrote:

 yikes.

 Was this a one-time thing?  Or does it happen consistently?  can you turn
 on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...)

 On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Hi

 My Spark job (running in local[*] with spark 1.4.1) reads data from a
 thrift server(Created an RDD, it will compute the partitions in
 getPartitions() call and in computes hasNext will return records from these
 partitions), count(), foreach() is working fine it returns the correct
 number of records. But whenever there is shuffleMap stage (like reduceByKey
 etc.) then all the tasks are executing properly but it enters in an
 infinite loop saying :


1. 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting ShuffleMapStage
1 (map at FilterMain.scala:59) because some of its tasks had failed:
0, 3


 Here's the complete stack-trace http://pastebin.com/hyK7cG8S

 What could be the root cause of this problem? I looked up and bumped
 into this closed JIRA https://issues.apache.org/jira/browse/SPARK-583
 (which is very very old)




 Thanks
 Best Regards






Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-12 Thread Imran Rashid
yikes.

Was this a one-time thing?  Or does it happen consistently?  can you turn
on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...)

On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Hi

 My Spark job (running in local[*] with spark 1.4.1) reads data from a
 thrift server(Created an RDD, it will compute the partitions in
 getPartitions() call and in computes hasNext will return records from these
 partitions), count(), foreach() is working fine it returns the correct
 number of records. But whenever there is shuffleMap stage (like reduceByKey
 etc.) then all the tasks are executing properly but it enters in an
 infinite loop saying :


1. 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting ShuffleMapStage 1
(map at FilterMain.scala:59) because some of its tasks had failed: 0, 3


 Here's the complete stack-trace http://pastebin.com/hyK7cG8S

 What could be the root cause of this problem? I looked up and bumped into
 this closed JIRA https://issues.apache.org/jira/browse/SPARK-583 (which
 is very very old)




 Thanks
 Best Regards



Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-11 Thread Akhil Das
Hi

My Spark job (running in local[*] with spark 1.4.1) reads data from a
thrift server(Created an RDD, it will compute the partitions in
getPartitions() call and in computes hasNext will return records from these
partitions), count(), foreach() is working fine it returns the correct
number of records. But whenever there is shuffleMap stage (like reduceByKey
etc.) then all the tasks are executing properly but it enters in an
infinite loop saying :


   1. 15/08/11 13:05:54 INFO DAGScheduler: Resubmitting ShuffleMapStage 1 (map
   at FilterMain.scala:59) because some of its tasks had failed: 0, 3


Here's the complete stack-trace http://pastebin.com/hyK7cG8S

What could be the root cause of this problem? I looked up and bumped into
this closed JIRA https://issues.apache.org/jira/browse/SPARK-583 (which
is very very old)




Thanks
Best Regards