Hi Walrus, Thank you for sharing your solution to your problem. I think I have met the similar problem before (i.e., one machine is working while others are idle.) and I just waits for a long time and the program will continue processing. I am not sure how your program filters an RDD by a locally stored set. If the set is a parameter of a function, I assume it should be copied to all worker nodes. But it is good that you solved your problem with a broadcast variable and the running time seems reasonable!
2013/11/3 Walrus theCat <[email protected]> > Hi Shangyu, > > Thanks for responding. This is a refactor of other code that isn't > completely scalable because it pulls stuff to the driver. This code keeps > everything on the cluster. I left it running for 7 hours, and the log just > froze. I checked ganglia, and only one machine's CPU seemed to be doing > anything. The last output on the log left my code at a spot where it is > filtering an RDD by a locally stored set. No error was thrown. I thought > that was OK based on the example code, but just in case, I changed it so > it's a broadcast variable. The un-refactored code (that pulls all the data > to the driver from time to time) runs in minutes. I've never had the > problem before of the log just getting non-responsive, and was wondering if > anyone knew of any heuristics I could check. > > Thank you > > > On Sat, Nov 2, 2013 at 2:55 PM, Shangyu Luo <[email protected]> wrote: > >> Yes, I think so. The running time depends on what work your are doing and >> how large it is. >> >> >> 2013/11/1 Walrus theCat <[email protected]> >> >>> That's what I thought, too. So is it not "hanging", just recalculating >>> for a very long time? The log stops updating and it just gives the output >>> I posted. If there are any suggestions as to parameters to change, or any >>> other data, it would be appreciated. >>> >>> Thank you, Shangyu. >>> >>> >>> On Fri, Nov 1, 2013 at 11:31 AM, Shangyu Luo <[email protected]> wrote: >>> >>>> I think the missing parent may be not abnormal. From my understanding, >>>> when a Spark task cannot find its parent, it can use some meta data to find >>>> the result of its parent or recalculate its parent's value. Imaging in a >>>> loop, a Spark task tries to find some value from the last iteration's >>>> result. >>>> >>>> >>>> 2013/11/1 Walrus theCat <[email protected]> >>>> >>>>> Are there heuristics to check when the scheduler says it is "missing >>>>> parents" and just hangs? >>>>> >>>>> >>>>> >>>>> On Thu, Oct 31, 2013 at 4:56 PM, Walrus theCat <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm not sure what's going on here. My code seems to be working thus >>>>>> far (map at SparkLR:90 completed.) What can I do to help the scheduler >>>>>> out >>>>>> here? >>>>>> >>>>>> Thanks >>>>>> >>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Completed >>>>>> ShuffleMapTask(10, 211) >>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: Stage 10 (map at >>>>>> SparkLR.scala:90) finished in 0.923 s >>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: looking for newly >>>>>> runnable stages >>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: running: Set(Stage 11) >>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: waiting: Set(Stage 9, >>>>>> Stage 8) >>>>>> 13/10/31 02:10:13 INFO scheduler.DAGScheduler: failed: Set() >>>>>> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for >>>>>> Stage 9: List(Stage 11) >>>>>> 13/10/31 02:10:16 INFO scheduler.DAGScheduler: Missing parents for >>>>>> Stage 8: List(Stage 9) >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> -- >>>> >>>> Shangyu, Luo >>>> Department of Computer Science >>>> Rice University >>>> >>>> -- >>>> Not Just Think About It, But Do It! >>>> -- >>>> Success is never final. >>>> -- >>>> Losers always whine about their best >>>> >>> >>> >> >> >> -- >> -- >> >> Shangyu, Luo >> Department of Computer Science >> Rice University >> >> -- >> Not Just Think About It, But Do It! >> -- >> Success is never final. >> -- >> Losers always whine about their best >> > > -- -- Shangyu, Luo Department of Computer Science Rice University -- Not Just Think About It, But Do It! -- Success is never final. -- Losers always whine about their best
