Hi All, I am running into a weird result with Spark SQL Outer joins. The results for all of them seem to be the same, which does not make sense due to the data. Here are the queries that I am running with the results:
sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , d.account AS d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend , d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s FULL OUTER JOIN dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND s.ad = d.ad) WHERE s.date >= '2016-01-03' AND d.date >= '2016-01-03'").count() RESULT:23747 sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , d.account AS d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend , d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s LEFT OUTER JOIN dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND s.ad = d.ad) WHERE s.date >= '2016-01-03' AND d.date >= '2016-01-03'").count() RESULT:23747 sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , d.account AS d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend , d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s RIGHT OUTER JOIN dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND s.ad = d.ad) WHERE s.date >= '2016-01-03' AND d.date >= '2016-01-03'").count() RESULT: 23747 Was wondering if someone had encountered this issues before. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Weird-results-with-Spark-SQL-Outer-joins-tp26861.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org