Hi Hao: I tried broadcastjoin with following steps, and found that my
query is still running slow ,not very sure if I'm doing right with
broadcastjoin:1.add "spark.sql.autoBroadcastJoinThreshold 104857600(100MB)"
in conf/spark-default.conf. 100MB is larger than any of my 2 tables.2.start
...@intel.com;
ssab...@gmail.com; user@spark.apache.org
Subject: 回复:回复:RE: 回复:Re: sparksql running slow while joining_2_tables.
update status after i did some tests. I modified some other parameters, found 2
parameters maybe relative.
spark_worker_instance and spark.sql.shuffle.partitions
before Today I
.
Thanks&Best regards!
罗辉 San.Luo
- 原始邮件 -
发件人:
收件人:"Cheng, Hao" , "Wang, Daoyuan"
, "Olivier Girardot" , "user"
,
主题:回复:RE: 回复:Re: sparksql running slow while joining_2_tables.
日期:2015年05月06日 09点51分
db has 1.7mill
You can use
Explain extended select ….
From: luohui20...@sina.com [mailto:luohui20...@sina.com]
Sent: Tuesday, May 05, 2015 9:52 AM
To: Cheng, Hao; Olivier Girardot; user
Subject: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.
As I know broadcastjoin is automatically enabled by
As I know broadcastjoin is automatically enabled by
spark.sql.autoBroadcastJoinThreshold.refer to
http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options
and how to check my app's physical plan,and others things like optimized
plan,executable plan.etc
thanks
-