Re: spark sql broadcast join ?

Takeshi Yamamuro Fri, 17 Jun 2016 02:11:55 -0700

Hi,

Spark sends a smaller table into all the works as broadcast variables,
and it joins the table partition-by-partiiton.
By default, if table size is under 10MB, the broadcast join works.
See:
http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#other-configuration-options


// maropu


On Fri, Jun 17, 2016 at 4:05 AM, kali.tumm...@gmail.com <
kali.tumm...@gmail.com> wrote:

> Hi All,
>
> I had used broadcast join in spark-scala applications I did used
> partitionby
> (Hash Partitioner) and then persit for wide dependencies, present project
> which I am working on pretty much Hive migration to spark-sql which is
> pretty much sql to be honest no scala or python apps.
>
> My question how to achieve broadcast join in plain spark-sql ? at the
> moment
> join between two talbes is taking ages.
>
> Thanks
> Sri
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-broadcast-join-tp27184.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Re: spark sql broadcast join ?

Reply via email to