Hi Shangyu, Thanks for responding. This is a refactor of other code that isn't completely scalable because it pulls stuff to the driver. This code keeps everything on the cluster. I left it running for 7 hours, and the log just froze. I checked ganglia, and only one machine's CPU seemed to be doing anything. The last output on the log left my code at a spot where it is filtering an RDD by a locally stored set. No error was thrown. I thought that was OK based on the example code, but just in case, I changed it so it's a broadcast variable. The un-refactored code (that pulls all the data to the driver from time to time) runs in minutes. I've never had the problem before of the log just getting non-responsive, and was wondering if anyone knew of any heuristics I could check.
Thank you On Sat, Nov 2, 2013 at 2:55 PM, Shangyu Luo <[email protected]> wrote: > Yes, I think so. The running time depends on what work your are doing and > how large it is. > > > 2013/11/1 Walrus theCat <[email protected]> > >> That's what I thought, too. So is it not "hanging", just recalculating >> for a very long time? The log stops updating and it just gives the output >> I posted. If there are any suggestions as to parameters to change, or any >> other data, it would be appreciated. >> >> Thank you, Shangyu. >> >> >> On Fri, Nov 1, 2013 at 11:31 AM, Shangyu Luo <[email protected]> wrote: >> >>> I think the missing parent may be not abnormal. From my understanding, >>> when a Spark task cannot find its parent, it can use some meta data to find >>> the result of its parent or recalculate its parent's value. Imaging in a >>> loop, a Spark task tries to find some value from the last iteration's >>> result. >>> >>> >>> 2013/11/1 Walrus theCat <[email protected]> >>> >>>> Are there heuristics to check when the scheduler says it is "missing >>>> parents" and just hangs? >>>> >>>> >>>> >>>> On Thu, Oct 31, 2013 at 4:56 PM, Walrus theCat >>>> <[email protected]>wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm not sure what's going on here. My code seems to be working thus >>>>> far (map at SparkLR:90 completed.) What can I do to help the scheduler >>>>> out >>>>> here? >>>>> >>>>> Thanks >>>>> >>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Completed >>>>> ShuffleMapTask(10, 211) >>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Stage 10 (map at >>>>> SparkLR.scala:90) finished in 0.923 s >>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: looking for newly >>>>> runnable stages >>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: running: Set(Stage 11) >>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: waiting: Set(Stage 9, >>>>> Stage 8) >>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: failed: Set() >>>>> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for >>>>> Stage 9: List(Stage 11) >>>>> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for >>>>> Stage 8: List(Stage 9) >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> -- >>> >>> Shangyu, Luo >>> Department of Computer Science >>> Rice University >>> >>> -- >>> Not Just Think About It, But Do It! >>> -- >>> Success is never final. >>> -- >>> Losers always whine about their best >>> >> >> > > > -- > -- > > Shangyu, Luo > Department of Computer Science > Rice University > > -- > Not Just Think About It, But Do It! > -- > Success is never final. > -- > Losers always whine about their best >
