Here is the stage overview: [image: Inline image 2] and here are the stage details for stage 0: [image: Inline image 1] Transformations from first stage to the second one are trivial, so that should not be the bottle neck (apart from keyBy().groupByKey() that causes the shuffle write/read).
Kind regards, Domen On Thu, Mar 20, 2014 at 8:38 PM, Mayur Rustagi [via Apache Spark User List] <[email protected]> wrote: > I would have preferred the stage window details & aggregate task > details(above the task list). > Basically if you run a job , it translates to multiple stages, each stage > translates to multiple tasks (each run on worker core). > So some breakup like > my job is taking 16 min > 3 stages , stage 1 : 5 min Stage 2: 10 min & stage 3:1 min > in Stage 2 give me task aggregate screenshot which talks about 50 > percentile, 75 percentile & 100 percentile. > Regards > Mayur > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Thu, Mar 20, 2014 at 9:55 AM, sparrow <[hidden > email]<http://user/SendEmail.jtp?type=node&node=2962&i=0> > > wrote: > >> >> This is what the web UI looks like: >> [image: Inline image 1] >> >> I also tail all the worker logs and theese are the last entries before >> the waiting begins: >> >> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator: >> maxBytesInFlight: 50331648, minRequest: 10066329 >> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator: >> Getting 29853 non-zero-bytes blocks out of 37714 blocks >> 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator: >> Started 5 remote gets in 62 ms >> [PSYoungGen: 12464967K->3767331K(10552192K)] >> 36074093K->29053085K(44805696K), 0.6765460 secs] [Times: user=5.35 >> sys=0.02, real=0.67 secs] >> [PSYoungGen: 10779466K->3203826K(9806400K)] >> 35384386K->31562169K(44059904K), 0.6925730 secs] [Times: user=5.47 >> sys=0.00, real=0.70 secs] >> >> From the screenshot above you can see that task take ~ 6 minutes to >> complete. The amount of time it takes the tasks to complete seems to depend >> on the amount of input data. If s3 input string captures 2.5 times less >> data (less data to shuffle write and later read), same tasks take 1 >> minute. Any idea how to debug what the workers are doing? >> >> Domen >> >> On Wed, Mar 19, 2014 at 5:27 PM, Mayur Rustagi [via Apache Spark User >> List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=2938&i=0> >> > wrote: >> >>> You could have some outlier task that is preventing the next set of >>> stages from launching. Can you check out stages state in the Spark WebUI, >>> is any task running or is everything halted. >>> Regards >>> Mayur >>> >>> Mayur Rustagi >>> Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="<a >>> href="tel:%2B17602033257" value="+17602033257" target="_blank"> >>> +17602033257" target="_blank"><a >>> href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257" >>> target="_blank">+1 (760) 203 3257 >>> http://www.sigmoidanalytics.com >>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>> >>> >>> >>> On Wed, Mar 19, 2014 at 5:40 AM, Domen Grabec <[hidden >>> email]<http://user/SendEmail.jtp?type=node&node=2882&i=0> >>> > wrote: >>> >>>> Hi, >>>> >>>> I have a cluster with 16 nodes, each node has 69Gb ram (50GB goes to >>>> spark) and 8 cores running spark 0.8.1. I have a groupByKey operation that >>>> causes a wide RDD dependency so shuffle write and shuffle read are >>>> performed. >>>> >>>> For some reason all worker threads seem to sleep for about 3-4 minutes >>>> each time performing a shuffle read and completing a set of tasks. See >>>> graphs below how no resources are being utilized in specific time windows. >>>> >>>> Each time 3-4 minutes pass, a next set of tasks are being grabbed and >>>> processed, and then another waiting period happens. >>>> >>>> Each task has an input of 80Mb +- 5Mb data to shuffle read. >>>> >>>> [image: Inline image 1] >>>> >>>> Here <http://pastebin.com/UHWMdTRY> is a link to thread dump performed >>>> in the middle of the waiting period. Any idea what could cause the long >>>> waits? >>>> >>>> Kind regards, Domen >>>> >>> >>> >>> >>> ------------------------------ >>> If you reply to this email, your message will be added to the >>> discussion below: >>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2882.html >>> To start a new topic under Apache Spark User List, email [hidden >>> email]<http://user/SendEmail.jtp?type=node&node=2938&i=1> >>> To unsubscribe from Apache Spark User List, click here. >>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>> >> >> >> ------------------------------ >> View this message in context: Re: Spark worker threads >> waiting<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2938.html> >> Sent from the Apache Spark User List mailing list >> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >> > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2962.html > To start a new topic under Apache Spark User List, email > [email protected] > To unsubscribe from Apache Spark User List, click > here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=ZG9tZW5AY2VsdHJhLmNvbXwxfC01NjUwMzk2ODU=> > . > NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > stageDetails.png (30K) <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/2988/0/stageDetails.png> stages.png (80K) <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/2988/1/stages.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-worker-threads-waiting-tp2859p2988.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
