Dear Romi, Priya, Sujt and Shivaram and all, I have took lots of days to think into this issue, however, without any enough good solution...I shall appreciate your all kind help. There is an RDD<StringDate> rdd1, and another RDD<StringDate, float> rdd2, (rdd2 can be PairRDD, or DataFrame with two columns as <StringDate, float>).StringDate column values from rdd1 and rdd2 are cross but not the same.
I would like to get a new RDD<StringDate, float> rdd3, StringDate in rdd3 would be all from (same) as rdd1, and float in rdd3 would be from rdd2 if its StringDate is in rdd2, or else NULL would be assigned. each row in rdd3[ i ] = <rdd1[ i ].StringDate, rdd2[ i ].float or NULL>, rdd2[i].StringDate would be same as rdd1[ i ].StringDate, then rdd2[ i ].float is assigned rdd3[ i ] StringDate part. What kinds of API or function would I use... Thanks very much!Zhiliang