Hi Hao:      I tried broadcastjoin with following steps, and found that my 
query is still running slow ,not very sure if I'm doing right with 
broadcastjoin:1.add "spark.sql.autoBroadcastJoinThreshold   104857600(100MB)" 
in conf/spark-default.conf. 100MB is larger than any of my 2 tables.2.start 
bin/spark-sql and confirm this setting worked both in environment page of my 
spark cluster web UI and sparksql console;3.run "ANALYZE TABLE db1 COMPUTE 
STATISTICS noscan" and "ANALYZE TABLE sample3 COMPUTE STATISTICS noscan" and 
cache both these tables;
4.use extend plan my query and confirmed broadcasthashjoin is used in the 
physical plan;
5.run my query "select a.chrname,a.startpoint,a.endpoint, a.piece from db1 a 
join sample3 b on (a.chrname = b.name) where (b.startpoint > a.startpoint + 25) 
and b.endpoint <= a.endpoint;"
So, if there is mistakes in my operation pls point out.thanks.




--------------------------------

 

Thanks&amp;Best regards!
San.Luo

----- 原始邮件 -----
发件人:"Cheng, Hao" <[email protected]>
收件人:"Cheng, Hao" <[email protected]>, "[email protected]" 
<[email protected]>, Olivier Girardot <[email protected]>, user 
<[email protected]>
主题:RE: 回复:Re: sparksql running slow while joining_2_tables.
日期:2015年05月05日 08点38分





Or, have you ever try broadcast join?
 


From: Cheng, Hao [mailto:[email protected]]


Sent: Tuesday, May 5, 2015 8:33 AM

To: [email protected]; Olivier Girardot; user

Subject: RE: 回复:Re: sparksql running slow while joining 2 tables.


 
Can you print out the physical plan?
 
EXPLAIN SELECT xxx…
 
From:
[email protected] [mailto:[email protected]]


Sent: Monday, May 4, 2015 9:08 PM

To: Olivier Girardot; user

Subject: 回复:Re: sparksql running slow while joining 2 tables.
 
hi Olivier
spark1.3.1, with java1.8.0.45
and add 2 pics .
it seems like a GC issue. I also tried with different parameters like memory 
size of driver&executor, memory fraction, java opts...
but this issue still happens.
 

--------------------------------



 

Thanks&amp;Best regards!

罗辉 San.Luo

 


----- 
原始邮件 -----

发件人:Olivier Girardot <[email protected]>

收件人:[email protected], user <[email protected]>

主题:Re: sparksql running slow while joining 2 tables.

日期:2015年05月04日 20点46分

 

Hi, 

What is you Spark version ?

 


Regards, 


 


Olivier.


 

Le lun. 4 mai 2015 à 11:03, <[email protected]> a écrit :

hi guys
        when i am running a sql  like "select 
a.name,a.startpoint,a.endpoint, a.piece from db a join sample b on (a.name =
b.name) where (b.startpoint > a.startpoint + 25);" I found sparksql running 
slow in minutes which may caused by very long GC and shuffle time.
 
       table db is created from a txt file size at 56mb while table sample 
sized at 26mb, both at small size.
       my spark cluster is a standalone  pseudo-distributed spark cluster with 
8g executor and 4g driver manager.
       any advises? thank you guys.
 
 

--------------------------------



 

Thanks&amp;Best regards!

罗辉 San.Luo



---------------------------------------------------------------------

To unsubscribe, e-mail: 
[email protected]

For additional commands, e-mail: 
[email protected]






Reply via email to