Hi, Spark sends a smaller table into all the works as broadcast variables, and it joins the table partition-by-partiiton. By default, if table size is under 10MB, the broadcast join works. See: http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#other-configuration-options
// maropu On Fri, Jun 17, 2016 at 4:05 AM, kali.tumm...@gmail.com < kali.tumm...@gmail.com> wrote: > Hi All, > > I had used broadcast join in spark-scala applications I did used > partitionby > (Hash Partitioner) and then persit for wide dependencies, present project > which I am working on pretty much Hive migration to spark-sql which is > pretty much sql to be honest no scala or python apps. > > My question how to achieve broadcast join in plain spark-sql ? at the > moment > join between two talbes is taking ages. > > Thanks > Sri > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-broadcast-join-tp27184.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- --- Takeshi Yamamuro