I am joining two data frames as shown in the code below. This is throwing NullPointerException.
I have a number of different join throughout the program and the SparkContext throws this NullPointerException on a randomly on one of the joins. The two data frames are very large data frames ( around 1TB) I am using Spark version 1.5.2. Thanks in advance for any insights. Regards, Prasad. Below is the code. val userAndFmSegment = userData.as("userdata").join(fmSegmentData.withColumnRenamed("USER_ID", "FM_USER_ID").as("fmsegmentdata"), $"userdata.PRIMARY_USER_ID" === $"fmsegmentdata.FM_USER_ID" && $"fmsegmentdata.END_DATE" >= date_sub($"userdata.REPORT_DATE", trailingWeeks * 7) && $"fmsegmentdata.START_DATE" <= date_sub($"userdata.REPORT_DATE", trailingWeeks * 7) , "inner").select( "USER_ID", "PRIMARY_USER_ID", "FM_BUYER_TYPE_CD" ) Log 16/01/05 17:41:19 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException java.lang.NullPointerException at org.apache.spark.sql.DataFrame.withColumnRenamed(DataFrame.scala:1161) at DnaAgg$.getUserIdAndFMSegmentId$1(DnaAgg.scala:294) at DnaAgg$.main(DnaAgg.scala:339) at DnaAgg.main(DnaAgg.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)