I have two RDD leftRDD = RDD[(Long, (DetailInputRecord, VISummary, Long))] and rightRDD = RDD[(Long, com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum)
DetailInputRecord is a object that contains (guid, sessionKey, sessionStartDAte, siteID) There are 10 records in leftRDD (confirmed with leftRDD.count, and each of DetailInputRecord record in leftRDD has data within its members) I do leftRDD.leftOuterJoin(rightRDD) viEventsWithListings = leftRDD spsLvlMetric = rightRDD val viEventsWithListingsJoinSpsLevelMetric = viEventsWithListings.leftOuterJoin(spsLvlMetric).map { case (viJoinSpsLevelMetric) => { val (sellerId, ((viEventDetail, viSummary, itemId), spsLvlMetric)) = viJoinSpsLevelMetric println("sellerId:" + sellerId) println("sessionKey:" + viEventDetail.get("sessionKey")) println("guid:" + viEventDetail.get("guid")) println("sessionStartDate:" + viEventDetail.get("sessionStartDate")) println("siteId:" + viEventDetail.get("siteId")) if (spsLvlMetric.isDefined) { // do something } } I print each of the items within the DetailInputRecord (viEventDetail) of viEventsWithListings before and within leftOuterJoin. Before leftOuterJoin i get values of each member within record (total 10 records). Within join when i do the print i get only guid as value for all members. How is this possible ? Within join: (print statements. These are guids) sessionKey:27c9fbc014b4f61526f0574001b73b00 guid:27c9fbc014b4f61526f0574001b73b00 sessionStartDate:27c9fbc014b4f61526f0574001b73b00 siteId:27c9fbc014b4f61526f0574001b73b00 What went wrong, i have debugged multiple times but fail to understand the reason. Appreciate your help -- Deepak