Just to elaborate more on what Silvio wrote below, check whether you are referencing a class or object member variable in a function literal/closure passed to one of the RDD methods.
Mohammed Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> From: Silvio Fiorito [mailto:silvio.fior...@granturing.com] Sent: Wednesday, March 2, 2016 8:43 PM To: Bijuna; user Subject: RE: Stage contains task of large size One source of this could be more than you intended (or realized) getting serialized as part of your operations. What are the transformations you're using? Are you referencing local instance variables in your driver app, as part of your transformations? You may have a large collection for instance which you're using in your transformation that will get serialized and sent to each executor. If you do have something like that look to use broadcast variables instead. From: Bijuna<mailto:bij...@gmail.com> Sent: Wednesday, March 2, 2016 11:20 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Stage contains task of large size Spark users, We are running spark application in standalone mode. We see warn messages in the logs which says Stage 46 contains a task of very large size (983 KB) . The maximum recommended task size is 100 KB. What is the recommended approach to fix this warning. Please let me know. Thank you Bijuna Sent from my iPad --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>