Looks like because of Spark-5063 RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
On Mon, Aug 17, 2015 at 8:13 PM, Preetam <preetam...@gmail.com> wrote: > The error could be because of the missing brackets after the word cache - > .ticketRdd.cache() > > > On Aug 17, 2015, at 7:26 AM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > > > > Hi All, > > > > Thank you very much for the detailed explanation. > > > > I have scenario like this- > > I have rdd of ticket records and another rdd of booking records. for > each ticket record, i need to check whether any link exists in booking > table. > > > > val ticketCachedRdd = ticketRdd.cache > > > > ticketRdd.foreach{ > > ticket => > > val bookingRecords = queryOnBookingTable (date, flightNumber, > flightCarrier) // this function queries the booking table and retrieves > the booking rows > > println(ticketCachedRdd.count) // this is throwing Null pointer exception > > > > } > > > > Is there somthing wrong in the count, i am trying to use the count of > cached rdd when looping through the actual rdd. whats wrong in this ? > > > > Thanks, > > Padma Ch >