Move your count operation outside the foreach and use a broadcast to access it inside the foreach. On Aug 17, 2015 10:34 AM, "Priya Ch" <learnings.chitt...@gmail.com> wrote:
> Looks like because of Spark-5063 > RDD transformations and actions can only be invoked by the driver, not > inside of other transformations; for example, rdd1.map(x => > rdd2.values.count() * x) is invalid because the values transformation and > count action cannot be performed inside of the rdd1.map transformation. For > more information, see SPARK-5063. > > On Mon, Aug 17, 2015 at 8:13 PM, Preetam <preetam...@gmail.com> wrote: > >> The error could be because of the missing brackets after the word cache - >> .ticketRdd.cache() >> >> > On Aug 17, 2015, at 7:26 AM, Priya Ch <learnings.chitt...@gmail.com> >> wrote: >> > >> > Hi All, >> > >> > Thank you very much for the detailed explanation. >> > >> > I have scenario like this- >> > I have rdd of ticket records and another rdd of booking records. for >> each ticket record, i need to check whether any link exists in booking >> table. >> > >> > val ticketCachedRdd = ticketRdd.cache >> > >> > ticketRdd.foreach{ >> > ticket => >> > val bookingRecords = queryOnBookingTable (date, flightNumber, >> flightCarrier) // this function queries the booking table and retrieves >> the booking rows >> > println(ticketCachedRdd.count) // this is throwing Null pointer >> exception >> > >> > } >> > >> > Is there somthing wrong in the count, i am trying to use the count of >> cached rdd when looping through the actual rdd. whats wrong in this ? >> > >> > Thanks, >> > Padma Ch >> > >