Hi,

I have two dataframes which has common column Product_Id on which i have to
perform a join operation.

    val transactionDF = readCSVToDataFrame(sqlCtx: SQLContext,
pathToReadTransactions: String, transactionSchema: StructType)
    val productDF = readCSVToDataFrame(sqlCtx: SQLContext,
pathToReadProduct:String, productSchema: StructType)

As, transaction data is very large but product data is small, i would
ideally do a  broadcast join where i braodcast productDF.

     val productBroadcastDF =  broadcast(productDF)
     val broadcastJoin = transcationDF.join(productBroadcastDF, "productId")

Or simply,  val innerJoin = transcationDF.join(productDF, "productId")
should give the same result as above.

But If i join using simple inner join i get  dataframe  with joined values
whereas if i do broadcast join i get empty dataframe with empty values. I
am not able to explain this behavior. Ideally both should give the same
result.

What could have gone wrong. Any one faced the similar issue?


Thanks,
Prateek

Reply via email to